Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor/encode decode via lut #40

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

mattiZed
Copy link
Contributor

@mattiZed mattiZed commented Apr 29, 2024

Okay, here we go. This should hopefully finally fix #39 and #37 as well as #35.

I have taken some inspiration from flexpolyline, I think the performance improvement on the encoder side comes mainly from computing the scaled longitude/latidude only once for each new coordinate pair - compare the loop in encode_coordinates(). When we subtract previous from current, previous is already scaled.

EDIT: Seeing the performance gains made in #42 I'd rather attribute the improved encoder performance to removing the string concatenation and directly pusing to the result buffer.

I have changed the code to use the LUT implementation when it was faster on my local machine (M2 Pro). I choose to leave the respective alternative in the code. I encourage you to test on your system as well, maybe you report differently.

The main improvement regarding #39 and #37 comes from explicitly computing and comparing against the maximum shift that would still provide a valid longitude/latitude coordinate.

Timings compared to latest main:

     Running benches/benchmarks.rs (target/release/deps/benchmarks-f896454355081916)
bench encode: 1000 coordinates
                        time:   [10.713 µs 10.744 µs 10.777 µs]
                        change: [-83.109% -83.025% -82.947%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

bench decode: 21502 coordinates
                        time:   [277.77 µs 280.55 µs 283.09 µs]
                        change: [+51.144% +52.512% +53.845%] (p = 0.00 < 0.05)
                        Performance has regressed.

bench polyline6 decoding
                        time:   [54.661 ns 54.814 ns 54.986 ns]
                        change: [+35.941% +36.468% +36.938%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) low mild
  3 (3.00%) high mild

bench HUGE polyline6 decoding
                        time:   [58.936 µs 59.531 µs 60.308 µs]
                        change: [+96.872% +99.934% +103.05%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  1 (1.00%) high severe

I don't quite know what to say about decoder performance. The conditionals sure are responsible for the performance hit, but it's still rather quick and not as abysmal as other solutions - I assume we'd have to sacrifice a bit of performance for robustness here?

mattiZed and others added 4 commits April 28, 2024 20:08
@michaelkirk michaelkirk mentioned this pull request May 1, 2024
@michaelkirk
Copy link
Member

This is really interesting and I think worth pursuing.

My understanding is that you've done at least 2 big things to make things faster:

  1. avoided allocating a bunch of strings (like I redundantly did in speed up encode #42)
  2. implemented a lookup table

My #42 does only the first part, and isn't quite as fast. But it's still a big boost: -70%-75%, and seemingly without any of the related slowdowns, so it feels not at all controversial.

How would you feel about starting by merging #42, or you (@mattiZed) could pull the string allocation part out of this PR into a separate PR, I'd be happy to have your name on the commits if you'd prefer that - especially since you wrote it first. 🙃

My motivation is that, it might be illuminating to evaluate what exactly we're getting from the LUT without muddying the waters, since seemingly most of the speedup is from avoiding all the string allocations.

@mattiZed
Copy link
Contributor Author

mattiZed commented May 1, 2024

I agree 100%. i will remove string allocation improvement from this pr so it can be merged first. then merging your pr after should be straight forward. will do it tomorrow.

@michaelkirk
Copy link
Member

I was thinking let's merge the string allocation improvement first, since it's strictly an improvement.

@mattiZed
Copy link
Contributor Author

mattiZed commented May 2, 2024

Either is fine for me, will update this PR now.

@mattiZed
Copy link
Contributor Author

mattiZed commented May 2, 2024

And here are the new timings:

Most of the encoding gain is/was from pushing to the same buffer all around, and will be retrieved again from #42.

     Running benches/benchmarks.rs (target/release/deps/benchmarks-f896454355081916)
bench encode: 1000 coordinates
                        time:   [55.520 µs 55.782 µs 56.073 µs]
                        change: [-12.412% -12.012% -11.609%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

bench decode: 21502 coordinates
                        time:   [280.85 µs 282.70 µs 284.43 µs]
                        change: [+56.166% +57.259% +58.311%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild

bench polyline6 decoding
                        time:   [53.460 ns 53.707 ns 53.999 ns]
                        change: [+34.154% +34.893% +35.661%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  6 (6.00%) high mild
  4 (4.00%) high severe

bench HUGE polyline6 decoding
                        time:   [65.463 µs 66.191 µs 67.074 µs]
                        change: [+135.64% +138.94% +142.48%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

@michaelkirk
Copy link
Member

Would you mind rebasing and running against the new benchmarks now that #42 is merged? I think/hope they'll be a little more stable.

@mattiZed
Copy link
Contributor Author

mattiZed commented May 7, 2024

Hi Mike, im on vacation for this and next week and will continue work on this after. However, you could also take over if you'd like/it's urgent in any sense.

Best,
Stefan

This was referenced May 7, 2024
@michaelkirk
Copy link
Member

Is this LUT implementation something you're still interested in merging @mattiZed?

Since we've, in the meanwhile merged a subset of the changes, the remaining work was some conflicts that need to be addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error handling of bad polyline still broken
2 participants